{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "9a71e8db-57f8-453b-80b4-a0c3890ed694",
   "metadata": {},
   "source": [
    "# Run your first InferenceService\n",
    "\n",
    "The InferenceService custom resource is the primary interface that is used for deploying models on KServe. Inside an InferenceService, users can specify multiple\n",
    "components that are used for handling inference requests. These components are the predictor, transformer, and explainer. Learn more [here](https://kserve.github.io/website/0.7/modelserving/data_plane/).\n",
    "\n",
    "In this tutorial, you will deploy an InferenceService with a predictor that will load a scikit-learn model trained with the [iris](https://archive.ics.uci.edu/ml/datasets/iris) dataset.\n",
    "This dataset has three output class: Iris Setosa, Iris Versicolour, and Iris Virginica.\n",
    "\n",
    "You will then send an inference request to your deployed model in order to get a prediction for the class of iris plant your request corresponds to.\n",
    "\n",
    "## Before you begin\n",
    "\n",
    "First, install the KServe SDK using the following command. If you run this command in a Jupyter notebook, restart the kernel after installing the SDK."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d482b3bb-f363-4ca7-a127-26ae24617881",
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install kserve==0.15.0"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "568b250b-102a-4dc9-a9a9-e17daf7e3a02",
   "metadata": {},
   "source": [
    "## Import `kubernetes.client` and `kserve` packages"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a38cbeda-eb60-4469-b861-559950886453",
   "metadata": {},
   "outputs": [],
   "source": [
    "from kubernetes import client \n",
    "from kserve import KServeClient\n",
    "from kserve import constants\n",
    "from kserve import utils\n",
    "from kserve import V1beta1InferenceService\n",
    "from kserve import V1beta1InferenceServiceSpec\n",
    "from kserve import V1beta1PredictorSpec\n",
    "from kserve import V1beta1SKLearnSpec"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e8332633-5f3a-4557-96d5-1f72a2222af8",
   "metadata": {},
   "source": [
    "## Declare Namespace\n",
    "\n",
    "This will retrieve the current namespace of your Kubernetes context. The InferenceService will be deployed in this namespace."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6a621ac9-1847-4249-b12a-76cf23620534",
   "metadata": {},
   "outputs": [],
   "source": [
    "namespace = utils.get_default_target_namespace()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "36933388-867e-4bfa-b4f4-7a1408e9ed34",
   "metadata": {},
   "source": [
    "## Define InferenceService"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "07ded005-72c9-466c-80d7-38db5cfdec5d",
   "metadata": {},
   "source": [
    "Next, define the InferenceService based on several key parameters. In the `predictor` parameter, a `V1beta1PredictorSpec` object with an embedded `V1beta1SKLearnSpec` object is created.\n",
    "Inside the `V1beta1SKLearnSpec` object, a storage URI is provided, pointing to the location of the trained iris model in cloud storage."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9a7f9b69-a305-482b-a9e1-0b706ba11341",
   "metadata": {},
   "outputs": [],
   "source": [
    "name='sklearn-iris'\n",
    "kserve_version='v1beta1'\n",
    "api_version = constants.KSERVE_GROUP + '/' + kserve_version\n",
    "\n",
    "isvc = V1beta1InferenceService(api_version=api_version,\n",
    "                               kind=constants.KSERVE_KIND_INFERENCESERVICE,\n",
    "                               metadata=client.V1ObjectMeta(\n",
    "                                   name=name, namespace=namespace, annotations={'sidecar.istio.io/inject':'false'}),\n",
    "                               spec=V1beta1InferenceServiceSpec(\n",
    "                               predictor=V1beta1PredictorSpec(\n",
    "                               sklearn=(V1beta1SKLearnSpec(\n",
    "                                   storage_uri=\"gs://kfserving-examples/models/sklearn/1.0/model\"))))\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e433c47a-a2cf-46ba-9c5a-9f509867c2c8",
   "metadata": {},
   "source": [
    "## Create InferenceService\n",
    "\n",
    "Now, with the InferenceService defined, you can now create it by calling the `create` method of the `KServeClient`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f8ef0696-7520-4476-96e8-778017f1d075",
   "metadata": {},
   "outputs": [],
   "source": [
    "KServe = KServeClient()\n",
    "KServe.create(isvc)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b9995e8b-ff37-4ccb-b564-d4fcf5423069",
   "metadata": {},
   "source": [
    "## Check the InferenceService\n",
    "\n",
    "Run the following command to watch the InferenceService until it is ready (or times out)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0a61e1fc",
   "metadata": {},
   "outputs": [],
   "source": [
    "KServe.get(name, namespace=namespace, watch=True, timeout_seconds=120)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2cfe3c80-3acc-4889-8a81-e85b0055fcde",
   "metadata": {},
   "source": [
    "## Perform Inference\n",
    "\n",
    "Next, you can try sending an inference request to the deployed model in order to get predictions. This notebook assumes that you running\n",
    "it in your Kubeflow cluster and will use the internal URL of the InferenceService.\n",
    "\n",
    "The Python `requests` library will be used to send a POST request containing your payload."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "eb13187c",
   "metadata": {},
   "outputs": [],
   "source": [
    "import requests\n",
    "\n",
    "isvc_resp = KServe.get(name, namespace=namespace)\n",
    "isvc_url = isvc_resp['status']['address']['url']\n",
    "\n",
    "print(isvc_url)\n",
    "\n",
    "inference_input = {\n",
    "  'instances': [\n",
    "    [6.8,  2.8,  4.8,  1.4],\n",
    "    [6.0,  3.4,  4.5,  1.6]\n",
    "  ]\n",
    "}\n",
    "\n",
    "response = requests.post(f\"{isvc_url}/v1/models/{name}:predict\", json=inference_input)\n",
    "print(response.text)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5b35f28f",
   "metadata": {},
   "source": [
    "You should see two predictions returned (i.e. `{\"predictions\": [1, 1]}`). Both sets of data points sent for inference correspond to the flower with index `1`.\n",
    "In this case, the model predicts that both flowers are \"Iris Versicolour\".\n",
    "\n",
    "To learn more about sending inference requests, please check out the [KServe guide](https://kserve.github.io/website/0.7/get_started/first_isvc/#3-determine-the-ingress-ip-and-ports)."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "91daa8b4-8ec3-4c09-b1eb-ad400d2e1c94",
   "metadata": {},
   "source": [
    "## Run Performance Test (Optional)\n",
    "\n",
    "If you want to load test the deployed model, try deploying the Kubernetes Job to drive load to the InferenceService."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "558d2704-62eb-43b5-a100-18ff3ae5c1f1",
   "metadata": {},
   "outputs": [],
   "source": [
    "creation_output=!kubectl create -f https://raw.githubusercontent.com/kserve/kserve/release-0.15/docs/samples/v1beta1/sklearn/v1/perf.yaml -n {namespace}\n",
    "creation_output"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a17377b8-30b8-43c2-a39c-da6aded3ff40",
   "metadata": {},
   "outputs": [],
   "source": [
    "job_name = creation_output[0].split(\"/\")[-1].split(\" \")[0]\n",
    "job_name"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f7313b81-d810-49e9-896d-a66d020eaad4",
   "metadata": {},
   "source": [
    "### Get Job Name"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "09784a23-312b-4a20-b2ee-145e28ec09ac",
   "metadata": {},
   "outputs": [],
   "source": [
    "get_pod_output=!kubectl get pods -n {namespace} | grep {job_name}\n",
    "get_pod_output"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2d6318ef-0360-4eae-8775-f1a1ac6e5fac",
   "metadata": {},
   "outputs": [],
   "source": [
    "pod_name=get_pod_output[0].split(\" \")[0]\n",
    "pod_name"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "580f9bda-52ab-4a42-8242-cb103be93ceb",
   "metadata": {},
   "source": [
    "### Check the Job Logs"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "382e7176-513e-4f05-af3e-f3bdffc596e5",
   "metadata": {},
   "outputs": [],
   "source": [
    "!kubectl logs {pod_name} -n {namespace}"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ac89bcc5-5c2f-4b1f-9b8c-cd3df2d349cf",
   "metadata": {},
   "source": [
    "The output should look like similar to the following:\n",
    "\n",
    "```\n",
    "Requests      [total, rate, throughput]         30000, 500.02, 499.99\n",
    "Duration      [total, attack, wait]             1m0s, 59.998s, 3.336ms\n",
    "Latencies     [min, mean, 50, 90, 95, 99, max]  1.743ms, 2.748ms, 2.494ms, 3.363ms, 4.091ms, 7.749ms, 46.354ms\n",
    "Bytes In      [total, mean]                     690000, 23.00\n",
    "Bytes Out     [total, mean]                     2460000, 82.00\n",
    "Success       [ratio]                           100.00%\n",
    "Status Codes  [code:count]                      200:30000\n",
    "Error Set:\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1fed41ca",
   "metadata": {},
   "source": [
    "## Delete InferenceService\n",
    "\n",
    "When you are done with your InferenceService, you can delete it by running the following."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "685ca6d1",
   "metadata": {},
   "outputs": [],
   "source": [
    "KServe.delete(name, namespace=namespace)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
